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Deducing the structure of neural circuits is one of the central 
»^ . problems of modern neuroscience. Recently-introduced calcium fluo- 

rescent imaging methods permit experimentalists to observe network 
activity in large populations of neurons, but these techniques provide 
Ph ' only indirect observations of neural spike trains, with limited time 

resolution and signal quality. In this work we present a Bayesian ap- 
proach for inferring neural circuitry given this type of imaging data. 
C^ ' We model the network activity in terms of a collection of coupled hid- 

fj^ I den Markov chains, with each chain corresponding to a single neuron 

in the network and the coupling between the chains reflecting the net- 
work's connectivity matrix. We derive a Monte Carlo Expectation- 
Maximization algorithm for fitting the model parameters; to obtain 
f^f^ • the sufficient statistics in a computationally-efficient manner, we in- 

^vj ' troduce a specialized blockwise-Gibbs algorithm for sampling from 

^S] ■ the joint activity of all observed neurons given the observed fluo- 

^^ ' rescence data. We perform large-scale simulations of randomly con- 

nected neuronal networks with biophysically realistic parameters and 
^-^ ■ find that the proposed methods can accurately infer the connectivity 

in these networks given reasonable experimental and computational 
constraints. In addition, the estimation accuracy may be improved 
significantly by incorporating prior knowledge about the sparseness 
of connectivity in the network, via standard Li penalization methods. 
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C^ ' 1. Introduction. Since Ramon y Cajal discovered that the brain is a rich 

and dense network of neurons [Ramon y Cajal (1904, 1923)], neuroscientists 
have been intensely curious about the details of these networks, which are 
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believed to be the biological substrate for memory, cognition and perception. 
While we have learned a great deal in the last century about "macro-circuits" 
(the connectivity between coarsely-defined brain areas), a number of key 
questions remain open about "micro-circuit" structure, that is, the connec- 
tivity within populations of neurons at a fine-grained cellular level. Two 
complementary strategies for investigating micro-circuits have been pursued 
extensively. Anatomical approaches to inferring circuitry do not rely on ob- 
serving neural activity; some recent exciting examples include array tomog- 
raphy [Micheva and Smith (2007)], genetic "brainbow" approaches [Livet 
et al. (2007)], and serial electron microscopy [Briggman and Denk (2006)]. 
Our work, on the other hand, takes a functional approach: our aim is to infer 
micro-circuits by observing the simultaneous activity of a population of neu- 
rons, without making direct use of fine-grained anatomical measurements. 

Experimental tools that enable simultaneous observations of the activ- 
ity of many neurons are now widely available. While arrays of extracellular 
electrodes have been exploited for this purpose [Hatsopoulos et al. (1998); 
Harris et al. (2003); Stein et al. (2004); Santhanam et al. (2006); Luczak 
et al. (2007)], the arrays most often used in vivo are inadequate for in- 
ferring monosynaptic connectivity in large populations of neurons, as the 
inter-electrode spacing is typically too large to record from closely neighbor- 
ing neurons;^ importantly, neighboring neurons are more likely connected 
to one another than distant neurons [Abeles (1991); Braitenberg and Schuz 
(1998)]. Alternately, calcium-sensitive fiuorescent indicators allow us to ob- 
serve the spiking activity of on the order of 10^ neighboring neurons [Tsien 
(1989); Yuste et al. (2006); Cossart, Aronov and Yuste (2003); Ohki et al. 
(2005)] within a micro-circuit. Some organic dyes achieve sufficiently high 
signal-to-noise ratios (SNR) that individual action potentials (spikes) may 
be resolved [Yuste et al. (2006)], and bulk-loading techniques enable ex- 
perimentalists to simultaneously fill populations of neurons with such dyes 
[Stosiek et al. (2003)]. In addition, genetically encoded calcium indicators 
are under rapid development in a number of groups, and are approaching 
SNR levels of nearly single spike accuracy as well [Wallace et al. (2008)]. 
Microscopy technologies for collecting fiuorescence signals are also rapidly 
developing. Cooled CCDs for wide-field imaging (either epifluorescence or 
confocal) now achieve a quantum efficiency of ~90% with frame rates up to 
60 Hz or greater, depending on the field of view [Djurisic et al. (2004)]. For 
in vivo work, 2-photon laser scanning microscopy can achieve similar frame 



''it is worth noting, however, that multielectrode arrays which have been recently de- 
veloped for use in the retina [Segev et al. (2004); Litke et al. (2004); Petrusca et al. 
(2007); Pillow et al. (2008)] or in cell culture [Lei et al. (2008)] are capable of much denser 
sampling. 



INFERRING NEURONAL CONNECTIVITY FROM CALCIUM IMAGING 3 

rates, using either acoustic-optical deflectors to focus light at arbitrary lo- 
cations in three-dimensional space [Iyer, Hoogland and Saggau (2006); Sa- 
lome et al. (2006); Reddy et al. (2008)] or resonant scanners [Nguyen et al. 
(2001)]. Together, these experimental tools can provide movies of calcium 
fluorescence transients from large networks of neurons with adequate SNR, 
at imaging frequencies of 30 Hz or greater, in both in vitro and in vivo 
preparations. 

Given these experimental advances in functional neural imaging, our goal 
is to develop efficient computational and statistical methods to exploit this 
data for the analysis of neural connectivity; see Figure 1 for a schematic 



Dads' lii«B»-sca«e catawm n«w»s«onca' incww 



Eidrad spto Smas 




/* 









^^^^*B5r 



Outxit rrfsfTHl ns<wa(1< mxlsi 






2 9* 





sg ■<» W flP lOQ 



Fig. 1. Schematic overview. The raw observed data is a large-scale calcium fluorescence 
movie, which is pre-processed to correct for movement artifacts and find regions-of-in- 
terest, that is, putative neurons. [Note that we have omitted details of these important 
preprocessing steps in this paper; see, for example, Cossart, Aronov and Yuste (2003); 
Dombeck et al. (2007) for further details.] Given the fluorescence traces Fi (i) from each 
neuron, we estimate the underlying spike trains (i.e., the time series of neural activity) 
using statistical deconvolution methods. Then we estimate the parameters of a network 
model given the observed data. Our major goal is to obtain an accurate estimate of the 
network connectivity matrix, which summarizes the information we are able to infer about 
the local neuronal microcircuit. (We emphasize that this illustration is strictly schematic, 
and does not correspond directly to any of the results described below.) This figure adapted 
from personal communications with R. Yuste, B. Watson and A. Packer. 
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overview. One major challenge here is that calcium transients due to action 
potentials provide indirect observations, and decay about an order of mag- 
nitude slower than the time course of the underlying neural activity [Yuste 
et al. (2006); Roxin, Hakim and Brunei (2008)]. Thus, to properly analyze 
the network connectivity, we must incorporate methods for effectively de- 
convolving the observed noisy fluorescence signal to obtain estimates of the 
underlying spiking rates [Yaksi and Friedrich (2006); Greenberg, Houwel- 
ing and Kerr (2008); Vogelstein et al. (2009)]. To this end, we introduce 
a coupled Markovian state-space model that relates the observed variables 
(fluorescence traces from the neurons in the microscope's field of view) to the 
hidden variables of interest (the spike trains and intracellular calcium con- 
centrations of these neurons), as governed by a set of biophysical parameters 
including the network connectivity matrix. As discussed in [Vogelstein et al. 
(2009)], this parametric approach effectively introduces a number of con- 
straints on the hidden variables, leading to significantly better performance 
than standard blind deconvolution approaches. Given this state-space model, 
we derive a Monte Carlo Expectation-Maximization algorithm for obtaining 
the maximum a posteriori estimates of the parameters of interest. Standard 
sampling procedures (e.g., Gibbs sampling or sequential Monte Carlo) are 
inadequate in this setting, due to the high dimensionality and nonlinear, 
non-Gaussian dynamics of the hidden variables; we therefore develop a spe- 
cialized blockwise-Gibbs approach for efficiently computing the sufficient 
statistics. This strategy enables us to accurately infer the connectivity ma- 
trix from large simulated neural populations, under realistic assumptions 
about the dynamics and observation parameters. 

2. Methods. 

2.1. Model. We begin by detailing a parametric generative model for the 
(unobserved) joint spike trains of all N observable neurons, along with the 
observed calcium fluorescence data. Each neuron is modeled as a generalized 
linear model (GLM). This class of models is known to capture the statisti- 
cal firing properties of individual neurons fairly accurately [Brillinger (1988); 
Chornoboy, Schramm and Karr (1988); Brillinger (1992); Plesser and Gerst- 
ner (2000); Paninski et al. (2004); Paninski (2004); Rigat, de Gunst and van 
Pelt (2006); Truccolo et al. (2005); Nykamp (2007); Kulkarni and Paninski 
(2007); Pillow et al. (2008); Vidne et al. (2009); Stevenson et al. (2009)]. 
We denote the ith. neuron's activity at time t as ni{t): in continuous time, 
ni{t) could be modeled as an unmarked point process, but we will take 
a discrete-time approach here, with each ni{t) taken to be a binary random 
variable. We model the spiking probability of neuron i via an instantaneous 
nonlinear function, /(•), of the filtered and summed input to that neuron at 
that time, Ji{t). This input is composed of the following: (i) some baseline 
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value, bi] (ii) some external vector stimulus, /S"^^*(t), that is linearly filtered 
by ki] and (iii) spike history terms, hij{t), encoding the influence on neuron i 
from neuron j, weighted by Wiji 



N 



(1) n,(t)~BernouUi[/(Ji(i))], 






To ensure computational tractability of the parameter inference problem, 
we must impose some reasonable constraints on the instantaneous nonlinear- 
ity /(•) (which plays the role of the inverse of the link function in the stan- 
dard GLM setting) and on the dynamics of the spike-history effects hij (t) . 
First, we restrict our attention to functions /(•) which ensure the concavity 
of the spiking loglikelihood in this model [Paninski (2004); Escola and Panin- 
ski (2011)], as we will discuss at more length below. In this paper we use 



(2) 



/(J) = P[n > I n ~ Poiss(e'^A)] = 1 - exp[-e'^A] 



(Figure 2), where the inclusion of A, the time step size, ensures that the 
firing rate scales properly with respect to the time discretization; see [Escola 
and Paninski (2011)] for a proof that this /(•) satisfies the required concav- 
ity constraints. However, we should note that in our experience the results 
depend only weakly on the details of /(•) within the class of log-concave 
models [Li and Duan (1989); Paninski (2004)] (see also Section 3.4 below). 
Second, because the algorithms we develop below assume Markovian dy- 
namics, we model the spike history terms as autoregressive processes driven 




Fig. 2. A plot of the firing rate nonlinearity f{J) used in our simulations. Note that the 
firing rate saturates at 1/A, because of our Bernoulli assumption (i.e., the spike count 
per bin is at most one). Here the binwidth A = (60 Hz)~^ . The horizontal gray line in- 
dicates 5 Hz, the baseline firing rate for most of the simulations discussed in the Results 
section. 
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by the spike train nj{t): 

(3) h,,{t) = (1 - A/r^.)/i.,(t - A) + n,{t - A) + 4VAe,'^.(t), 

where r/*- is a decay time constant, a!^, is a standard deviation parame- 



ter, V A ensures that the statistics of this Markov process have a proper 
Ornstein-Uhlenbeck hniit as A — )• 0, and throughout this paper, e denotes 
an independent standard normal random variable. Note that this model gen- 
eralizes [via a simple augmentation of the state variable hij{t)] to allow each 
neuron pair to have several spike history terms, each with a unique time 
constant, which when weighted and summed allow us to model a wide va- 
riety of possible post-synaptic effects, including bursting, facilitating, and 
depressing synapses; see [Vogelstein et al. (2009)] for further details. We re- 
strict our attention to the case of a single time constant r^^- per synapse here, 
so the deterministic part of hij{t) is a simple exponentially-filtered version 
of the spike train nj{t). Furthermore, we assume that rj^- is the same for 
all neurons and all synapses, although, in principle, each synapse could be 
modeled with its unique r/!-. We do that both for simplicity and also because 
we find that the detailed shape of the coupling terms hij{t) had a limited 
effect on the inference of the connectivity matrix, as illustrated in Figure 12 
below. Thus, we treat rj^ and a^j as known synaptic parameters which are 
the same for each neuron pair (i, j), and denote them as Th and ah hereafter. 
We chose values for Th and ah in our inference based on experimental data 
[Lefort et al. (2009)]; see Table 1 below. Therefore, our unknown spiking 
parameters are {wj, ki,bi}i<N, with Wj = {wn, . . . , tfjAf)- 

The problem of estimating the connectivity parameters w = {wj}j<7v in 
this type of GLM, given a fully-observed ensemble of neural spike trains 
{ni{t)}i<j\[, has recently received a great deal of attention; see the refer- 
ences above for a partial list. In the calcium fluorescent imaging setting, 
however, we do not directly observe spike trains; {ni{t)}i<N must be con- 
sidered a hidden variable here. Instead, each spike in a given neuron leads to 
a rapid increase in the intracellular calcium concentration, which then de- 
cays slowly due to various cellular buffering and extrusion mechanisms. We 
in turn make only noisy, indirect, and subsampled observations of this in- 
tracellular calcium concentration, via fluorescent imaging techniques [Yuste 
et al. (2006)]. To perform statistical inference in this setting, [Vogelstein 
et al. (2009)] proposed a simple conditional flrst-order hidden Markov model 
(HMM) for the intracellular calcium concentration Ci{t) in cell i at time t, 
along with the observed fluorescence, Fi{t): 



(4) Ci{t) = Ci{t - A) + {Ct - Ci{t - A))A/rf + Aimit) + a^VAe'^it), 



(5) Fi{t) = aiS{a{t))+^i + ^{a[y + jiS{a{t))ef{t). 
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This model can be interpreted as a simple driven autoregressive process: 
under nonspiking conditions, Cj(t) fluctuates around the baseline level of C\, 



driven by normally-distributed noise £f(t) with standard deviation (t?vA- 
Whenever the neuron fires a spike, nj(t) = 1, the calcium variable Ci{t) 
jumps by a fixed amount Ai, and subsequently decays with time constant t![. 
The fiuorescence signal Fi{t) corresponds to the count of photons collected 
at the detector per neuron per imaging frame. This photon count may be 
modeled with normal statistics, with the mean given by a saturating Hill- 
type function S{C) = C /{C + Kd) [Yasuda et al. (2004)] and the variance 
scaling with the mean; see [Vogelstein et al. (2009)] for further discussion. 
Because the parameter K^ effectively acts as a simple scale factor, and is 
a property of the fluorescent indicator, we assume throughout this work that 
it is known. Figure 3 shows a couple examples depicting the relationship 
between spike trains and observations. It will be useful to define an effective 
SNRas 

(f,. ^^^ ^ i?[F,(t)-F,(t-A)|n,(t) = l] 

^^ ^[(F,(t)-F,(t-A))V2|n,(t) = 0]V2' 

that is, the size of a spike-driven fluorescence jump divided by a rough mea- 
sure of the standard deviation of the baseline fluorescence. For concreteness, 
the effective SNR values in Figure 3 were 9 and 3 in the left and right panels, 
respectively. 

To summarize, equations (l)-(5) deflne a coupled HMM: the underly- 
ing spike trains {ni{t)}i<N and spike history terms {/ijj(t)}ij<Ar evolve in 
a Markovian manner given the stimulus 5"^^*(t). These spike trains in turn 
drive the intracellular calcium concentrations {Cj(t)}i<Ar, which are them- 
selves Markovian, but evolving at a slower timescale rf. Finally, we ob- 
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Fig. 3. Two example traces of simulated fluorescence data, at different SNR levels, 
demonstrating the relationship between spike trains and observed fluorescence in our 
model. Note that both panels have the same underlying spike train. Simulation parameters: 
hi = 0.7, Ct = 1 iJ,M, rf = 500 msec, Ai = 50 p.U, at = 0.1 /xM, 7i = 0.004 [effective SNR 
~ 9, as defined in equation (6); see also Figure 9 below] in the left panel and 7i = 0.016 
(eSNR ~ Z) m the right panel, and af = 0, A = (60 Hz)^^ . 
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serve only the fluorescence signals {-Fi(t)}i<Af, which are related in a simple 
Markovian fashion to the calcium variables {Cj(t)}j<7v- 

2.2. Goal and general strategy. Our primary goal is to estimate the con- 
nectivity matrix, w, given the observed set of calcium fluorescence signals 
F = {Fj}j<7v, where Fj = {Fi{t)}t<T- We must also deal with a number 
of intrinsic parameters,^ 9i: the intrinsic spiking parameters^ {bi,Wii}i<N, 
the calcium parameters {C'^,T^,Ai,a^}i<N, and the observation parameters 
{ai,l3i,'yi,af}i<N- We addressed the problem of estimating these intrin- 
sic parameters in earlier work [Vogelstein et al. (2009)]; thus, our focus 
here will be on the connectivity matrix w. A Bayesian approach is nat- 
ural here, since we have a good deal of prior information about neural 
connectivity; see [Rigat, de Gunst and van Pelt (2006)] for a related dis- 
cussion. However, a fully-Bayesian approach, in which we numerically inte- 
grate over the very high-dimensional parameter space 9 = {9i}i<N, where 
9i = {wi,bi,C'^,T^,Ai,a'^,ai,f3i,ji,af}, is less attractive from a computa- 
tional point of view. Thus, our compromise is to compute maximum a poste- 
riori (MAP) estimates for the parameters via an expectation-maximization 
(EM) algorithm, in which the sufficient statistics are computed by a hybrid 
blockwise Gibbs sampler and sequential Monte Carlo (SMC) method. More 
specifically, we iterate the steps: 

E-step: Evaluate Q(0, 0(')) = ^p[x|F;e(O]lnP[F,X|0] =/P[X|F;eW]lnP[F, 

X\9]dX; 
M-step: Solve 9^^+^^^ = argmaxg{Q{9, 9^''')) + In P{9)}, 

where X denotes the set of all hidden variables {Ci{t),ni{t),hij{t)}ij<]\f^t<T 
and P{9) denotes a (possibly improper) prior on the parameter space 9. Ac- 
cording to standard EM theory [Dempster, Laird and Rubin (1977); McLach- 
lan and Krishnan (1996)], each iteration of these two steps is guaranteed to 
increase the log-posterior \nP{9^'\F), and will therefore lead to at least 
a locally maximum a posteriori estimator. 

Now, our major challenge is to evaluate the auxiliary function Q{9^9^^') 
in the E-step. Our model is a coupled HMM, as discussed in the previous 
section; therefore, as usual in the HMM setting [Rabiner (1989)], Q may be 
broken up into a sum of simpler terms: 



it 



,e«) = j; / lnP[F,(t)|C,(t);ai,ft,7i,af]dP[C,(t)|F;e«] 



*The intrinsic parameters for neuron i are all its parameters minus the cross-coupling 
terms, that is, 9i = di\{wij}i^j . 

^To reduce the notational load, we will ignore the estimation of the stimulus filter ki 
below; this term may be estimated with bi and wu using very similar convex optimization 
methods, as discussed in [Vogelstein et al. (2009)]. 



(7) 
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+ 5^/'lnP[Ci(t)|Q(t-A), 

it 

+ Y^ f\nP[n,{t)\hi{t)-hi,^i]dP[ni{t)MtW,0% 



where hj(t) = {hij{t)}j<_N- Note that each of the three sums here corresponds 
to a different component of the model described in equations (l)-(5): the 
first sum involves the fluorescent observation parameters, the second the 
calcium dynamics, and the third the spiking dynamics. 

Thus, we need only compute low-dimensional marginals of the full pos- 
terior distribution P[X|F;0]; specifically, we need the pairwise marginals 
P[Ci(t)|F;^],P[Ci(i),Ci(t-A)|F;0], and P[ni(t),hi(t)|F;e]. Details for cal- 
culating P[Ci(t),Cj(i- A)|Fi;^i] and P[Ci(i)|Fi;^i] are found in [Vogelstein 
et al. (2009)], while calculating the joint marginal for the high-dimensional 
hidden variable hj necessitates the development of specialized blockwise 
Gibbs-SMC sampling methods, as we describe in the subsequent Sections 2.3 
and 2.4. Once we have obtained these marginals, the M-step breaks up into 
a number of independent optimizations that may be computed in parallel 
and which are therefore relatively straightforward (Section 2.5); see Sec- 
tion 2.6 for a pseudocode summary along with some specific implementation 
details. 

2.3. Initialization of intrinsic parameters via sequential Monte Carlo meth- 
ods. We begin by constructing relatively cheap, approximate preliminary 
estimators for the intrinsic parameters, 6i. The idea is to initialize our es- 
timator by assuming that each neuron is observed independently. Thus, we 
want to compute P[Ci{t),Ci{t - A)|Fi;^j] and P[Ci(t)|Fi;^j], and solve the 
M-step for each ^j, with the connectivity matrix parameters held fixed. This 
single-neuron case is much simpler, and has been discussed at length in [Vo- 
gelstein et al. (2009)]; therefore, we only provide a brief overview here. The 
standard forward and backward recursions provide the necessary posterior 
distributions, in principle [Shumway and Stoffer (2006)]: 

P[X,{t)\Fi{Q:t)\ 

(8) ^P[Fi{t)\Xi{t)] j P[X,{t)\X,{t- ^)] 

xP[Xi(t-A)|F,(0:t-A)]dXi(t-A), 
P[Xi{t),Xi{t-l^)\Yi] 

(9) =P[X,(t)|F,] 



10 Y. MISHCHENKO, J. T. VOGELSTEIN AND L. PANINSKI 

P[Xi{t)\Xi{t-A)]P[X,{t-A)\Fi{0:t-A)] 



X 



f P[Xi{t)\Xi{t - A)]P[X,{t - A)\Fi{0 :t-A)] dXi{t - A) ' 



where Fi[s : t) denotes the time series Fj from time points s to t, and we have 
dropped the conditioning on the parameters for brevity's sake. Equation (8) 
describes the forward (filter) pass of the recursion, and equation (9) describes 
the backward (smoother) pass, providing both i-*[Xj(t),Xj(i — A)|Fj] and 
P[Xj(i)|Fj] [obtained by marginalizing over Xi{t — A)]. 

Because these integrals cannot be analytically evaluated for our model, 
we approximate them using a SMC ("marginal particle filtering") method 
[Doucet, Godsill and Andrieu (2000); Doucet, de Freitas and Gordon (2001); 
Godsill, Doucet and West (2004)]. More specifically, we replace the forward 
distribution with a particle approximation: 

M 

(10) P[X,(t)|F,(0:t)]«5]p('")(t)<5[X,(t)-X('")(t)], 

m=l 

where m = l,...,Af indexes the M particles in the set (M was typically 
set to about 50 in our experiments), pV^ [t) corresponds to the relative 

"forward" probability of Xi{t) = Xj (t), and 5[-] indicates a Dirac mass. 
Instead of using the analytic forward recursion, equation (8), at each time 
step, we update the particle weights using the particle forward recursion 

, , , , P[xf'")(i)|xi"^(i-A)yrVt-A) 

Qi^i it)] 

where q[X^'^ (t)] is the proposal density from which we sample the par- 
ticle positions X^ (t). In this work we use the "one-step-ahead" sam- 
pler [Doucet, Godsih and Andrieu (2000); Vogelstein et al. (2009)], that 
is, q[X^"'\t)] = P[X^"^\t)\X^'^\t - A),Fi{t)]. After sampling and comput- 
ing the weights, we use stratified resampling [Done, Cappe and Moulines 
(2005)] to ensure the particles accurately approximate the desired distribu- 
tion. Once we complete the forward recursion from t = 0, . . . ,T, we begin 
the backward pass from t = T, . . . , 0, using 

M,.. P[xt\t)\xt'\t-A)]p^p\t-A) 



(12) r(™''"')(t,i-A)=pP(t) 



Em'P[xrit)\xr'it-^)]pT'it-^y 



M 

(13) p("^')(i-A) = ^r(™''"')(t,i-A), 
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to obtain the approximation 

P[X,it),Xi{t-A)\F,] 

(14) « Y. rt^"''\t,t-A)6[Mt)-Xt\t)] 

m,m' 

X S[Xi{t - A) - Xt'\t - A)] 

for more details, see [Vogelstein et al. (2009)]. Thus, equations (10)-(14) 
may be used to compute the sufficient statistics for estimating the intrinsic 
parameters 6i for each neuron. 

As discussed following equation (7), the M-step decouples into three in- 
dependent subproblems. The first term depends on only {ai,/3j,7j,o"i}; since 
P[Fi{t)\S{Ci{t)); 6i] is Gaussian, we can estimate these parameters by solv- 
ing a weighted regression problem (specifically, we use a coordinate-optimi- 
zation approach: we solve a quadratic problem for {aj,/3i} while holding 
{7i,(Tj} fixed, then estimate {7i,o"i} by the usual residual error formulas 
while holding {aj,/3j} fixed). Similarly, the second term requires us to opti- 
mize over {rf,Ai,Cj^}, and then we use the residuals to estimate af. Note 
that all the parameters mentioned so far are constrained to be non-negative, 
but may be solved efficiently using standard quadratic program solvers if we 
use the simple reparameterization r ? — )• 1 — A/rf. Finally, the last term may 
be expanded: 

E[lnP[n^{t),hi{t)\F;Bi]] 

(15) = P[n,{t),hi{t)\F;e,]lnf[Ji{t)] 

+ (1 - p[n,{t)Mt)\F; e^])in[i - fiMt))]; 

since Ji{t) is a linear function of {6j,Wj}, and the right-hand side of equa- 
tion (15) is concave in Ji{t), we see that the third term in equation (7) is 
a sum of terms which are concave in {6i,Wj} — and therefore also concave 
in the linear subspace {bi,Wii} with {wij}i^j held fixed — and may thus be 
maximized efficiently using any convex optimization method, for example, 
Newton-Rap hson or conjugate gradient ascent. 

Our procedure therefore is to initialize the parameters for each neuron 
using some default values that we have found to be effective in practice in 
analyzing real data, and then iteratively (i) estimate the marginal posteri- 
ors via the SMC recursions (10)-(14) (E-step), and (ii) maximize over the 
intrinsic parameters 6i (M-step), using the separable convex optimization 
approach described above. We iterate these two steps until the change in 6i 
does not exceed some minimum threshold. We then use the marginal poste- 
riors from the last iteration to seed the blockwise Gibbs sampling procedure 
described below for approximating P[nj,hj|F;^j]. 
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2.4. Estimating joint posteriors over weakly coupled neurons. Now we 
turn to the key problem: constructing an estimate of the joint marginals 
{P[ni{t),hi{t)\F; 0]}i<N,t<T, which are the sufficient statistics for estimating 
the connectivity matrix w [recall equation (7)]. The SMC method described 
in the preceding section only provides the marginal distribution over a sin- 
gle neuron's hidden variables; this method may in principle be extended to 
obtain the desired full posterior P[X(t),X(t- A)|F;6'], but SMC is fun- 
damentally a sequential importance sampling method, and therefore scales 
poorly as the dimensionality of the hidden state X(t) increases [Bickel, Li 
and Bengtsson (2008)]. Thus, we need a different approach. 

One very simple idea is to use a Gibbs sampler: sample sequentially from 

(16) Xi{t)r^P[Xi{t)\X\i,Xi{0),...,X,{t-A),Xi{t + A),...,Xi{T),F;e], 

looping over all cells i and all time bins t. Unfortunately, this approach is 
likely to mix poorly, due to the strong temporal dependence between Xj(i) 
and Xj(t + A). Instead, we propose a blockwise Gibbs strategy, sampling 
one spike train as a block: 

(17) Xi~P[Xi|X\i,F;0]. 

If we can draw these blockwise samples Xj = Xj(s :t) efficiently for a large 
subset of t — s adjacent time-bins simultaneously, then we would expect the 
resulting Markov chain to mix much more quickly than the single-element 
Gibbs chain. This follows due to the weak dependence between Xj and Xj 
when i ^ j, and the fact that Gibbs is most efficient for weakly-dependent 
variables [Robert and Casella (2005)]. 

So, how can we efficiently sample from P[Xj|Xw,F;0]? One attractive 
approach is to try to re-purpose the SMC method described above, which is 
quite effective for drawing approximate samples from P[Xj|X\ j, Fj; 9] for one 
neuron i at a time. Recall that sampling from an HMM is in principle easy 
by the "propagate forward, sample backward" method: we first compute the 
forward probabilities P[Xi{t)\'K\i{0:t),Fi{0:t);9] recursively for timesteps 
i = up to T, then sample backward from P[Xi{t)\X\i{0 : T),Fi{0 : T),Xi{t- 
A);6']. This approach is powerful because each sample requires just linear 
time to compute [i.e., 0{T/A) time, where T/A is the number of desired 
time steps]. Unfortunately, in this case we can only compute the forward 
probabilities approximately (via equations (lO)-(ll)), and so therefore this 
attractive forward-backward approach only provides approximate samples 
from i-*[Xj|X\j,F;0], not the exact samples required for the validity of the 
Gibbs method. 

Of course, in principle, we should be able to use the Metropolis-Hastings 
(M-H) algorithm to correct these approximate samples. The problem is that 
the M-H acceptance ratio in this setting involves a high-dimensional integral 
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over the set of paths that the particle filter might possibly trace out, and 
is therefore difficult to compute directly. [Andrieu, Doucet and Holenstein 
(2007)] discuss this problem at more length, along with some proposed solu- 
tions. A slightly simpler approach was introduced by [Neal, Beal and Roweis 
(2003)]. Their idea is to exploit the 0(T/A) forward-backward sampling 
method by embedding a discrete Markov chain within the continuous state 
space Xt on which Xi{t) is defined; the state space of this discrete embedded 
chain is sampled randomly according to some distribution pt with support on 
Xf. It turns out that an appropriate Markov chain (incorporating the orig- 
inal state space model transition and observation probabilities, along with 
the auxiliary sampling distributions pt) may be constructed quite tractably, 
guaranteeing that the samples produced by this algorithm have the desired 
equilibrium density. See [Neal, Beal and Roweis (2003)] for details. 

We can apply this embedded-chain method directly here to sample from 
P[Xj|Xw,F;0]. The one remaining question is how to choose the auxil- 
iary densities pt- We would like to choose these densities to be close to the 
desired marginal densities P[Xi(t)\'K\i,F; 9], and conveniently, we have al- 
ready computed a good (discrete) approximation to these densities, using 
the SMC methods described in the last section. The algorithm described in 
[Neal, Beal and Roweis (2003)] requires the densities pt to be continuous, 
so we simply convolve our discrete SMC-based approximation [specifically, 
the Xj(t)-marginal of equation (14)] with an appropriate normal density to 
arrive at a very tractable mixture-of-Gaussians representation for pt- 

Thus, to summarize, our procedure for approximating the desired joint 
state distributions P[ni{t),hi{t)\F;6] has a Metropolis-within-blockwise- 
Gibbs flavor, where the internal Metropolis step is replaced by the 0(T/A) 
embedded-chain method introduced by [Neal, Beal and Roweis (2003)], and 
the auxiliary densities pt necessary for implementing the embedded-chain 
sampler are obtained using the SMC methods from [Vogelstein et al. (2009)]. 

2.4.1. A factorized approximation of the joint posteriors. If the SNR in 
the calcium imaging is sufficiently high, then, by definition, the observed fiu- 
orescence data Fi will provide enough information to determine the underly- 
ing hidden variables Xj. Thus, in this case the joint posterior approximately 
factorizes into a product of marginals for each neuron i: 

(18) P[X|F;^]« J]P[X,|F;^,]. 

i<N 

We can take advantage of this because we have already estimated all the 
marginals on the right-hand side using the approximate SMC methods in 
Section 2.3. This factorized approximation entails a significant gain in ef- 
ficiency for two reasons: first, it obviates the need to generate joint sam- 
ples via the expensive blockwise-Gibbs approach described above; and sec- 
ond, because we can easily parallelize the SMC step, inferring the marginals 
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P[Xi{t)\Fi;9i] and estimating the parameters 9i for each neuron on a sepa- 
rate processor. We will discuss the empirical accuracy of this approximation 
in Section 3. 

2.5. Estimating the connectivity matrix. Computing the M-step for the 
connectivity matrix, w, is an optimization problem with on the order of A^^ 
variables. The auxiliary function equation (7) is concave in w, and decom- 
poses into N separable terms that may be optimized independently using 
standard ascent methods. To improve our estimates, we will incorporate 
two sources of strong a priori information via our prior P(w): first, previous 
anatomical studies have established that connectivity in many neuroanatom- 
ical substrates is "sparse," that is, most neurons form synapses with only 
a fraction of their neighbors [Buhl, Halasy and Somogyi (1994); Thompson, 
Girdlestone and West (1988); Reyes et al. (1998); Feldmeyer et al. (1999); 
Gupta, Wang and Markram (2000); Feldmeyer and Sakmann (2000); Pe- 
tersen and Sakmann (2000); Binzegger, Douglas and Martin (2004); Song 
et al. (2005); Mishchenko et al. (2009)], implying that many elements of the 
connectivity matrix w are zero; see also [Paninski (2004); Rigat, de Gunst 
and van Pelt (2006); Pillow et al. (2008); Stevenson et al. (2008)] for further 
discussion. Second, "Dale's law" states that each of a neuron's postsynaptic 
connections in the adult cortex (and many other brain areas) must all be of 
the same sign (either excitatory or inhibitory) . Both of these priors are easy 
to incorporate in the M-step optimization, as we discuss below. 

2.5.1. Imposing a sparse prior on the connectivity. It is well known that 
imposing sparseness via an Li-regularizer can dramatically reduce the amount 
of data necessary to accurately reconstruct sparse high-dimensional parame- 
ters [Tibshirani (1996); Tipping (2001); Donoho and Elad (2003); Ng (2004); 
Candes and Wakin (2008); Mishchenko (2009)]. We incorporate a prior of 
the form Inp('w) = const — A^^ • \wij\, and additionally enforce the con- 
straints \wij\ <L, for a suitable constant L (since both excitatory and in- 
hibitory cortical connections are known to be bounded in size). Since the 
penalty Inp(w) is concave, and the constraints \wij\ < L are convex, we may 
solve the resulting optimization problem in the M-step using standard con- 
vex optimization methods [Boyd and Vandenberghe (2004)]. In addition, the 
problem retains its separable structure: the full optimization may be broken 
up into A^ smaller problems that may be solved independently. 

2.5.2. Imposing Dale's law on the connectivity. Enforcing Dale's law re- 
quires us to solve a nonconvex, nonseparable problem: we need to optimize 
the concave function Q{9,9^') + \nP[9) under the nonconvex, nonseparable 
constraint that all of the elements in any column of the matrix w are of 
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the same sign (either nonpositive or nonnegative) . It is difficult to solve this 
nonconvex problem exactly, but we have found that simple greedy methods 
are quite efficient in finding good approximate solutions. 

We begin with our original sparse solution, obtained as discussed in the 
previous subsection without enforcing Dale's law. Then we assign each neu- 
ron as either excitatory or inhibitory, based on the weights we have in- 
ferred in the previous step: that is, neurons i whose inferred postsynaptic 
connections Wij are largely positive are tentatively labeled excitatory, and 
neurons with largely inhibitory inferred postsynapic connections are labeled 
inhibitory. Neurons which are highly ambiguous may be unassigned in the 
early iterations, to avoid making mistakes from which it might be difficult 
to recover. Given the assignments Oj (oj = 1 for putative excitatory cells, 
—1 for inhibitory, and for neurons which have not yet been assigned), we 
solve the convex, separable problem 

(19) argmax Q{e,e^^^) - X^\wij\, 

aiWij>0,\wij\<LWi,j ■• 

which may be handled using the standard convex methods discussed above. 
Given the new estimated connectivities w, we can re-assign the labels Oj, 
or flip some randomly to check for local optima. We have found this simple 
approach to be effective in practice. 

2.6. Specific implementation notes. Pseudocode summarizing our ap- 
proach is given in Algorithm 1. As discussed in Section 2.3, the intrinsic 
parameters 9i may be initialized effectively using the methods described 
in [Vogelstein et al. (2009)]; then the full parameter 9 is estimated via 
EM, where we use the embedded-chain-within-blockwise-Gibbs approach 
discussed in Section 2.4 (or the cheaper factorized approximation described 
in Section 2.4.1) to obtain the sufficient statistics in the E-step and the sep- 
arable convex optimization methods discussed in Section 2.5 for the M-step. 

As emphasized above, the parallel nature of these EM steps is essen- 
tial for making these computations tractable. We performed the bulk of 
our analysis on a 256-processor cluster of Intel Xeon L5430 based comput- 
ers (2.66 GHz). For 10 minutes of simulated fluorescence data, imaged at 
30 Hz, calculations using the factorized approximation typically took 10- 
20 minutes per neuron (divided by the number of available processing nodes 
on the cluster), with time split approximately equally between (i) estimat- 
ing the intrinsic parameters 6i, (ii) approximating the posteriors using the 
independent SMC method, and (iii) estimating the connectivity matrix, w. 
The hybrid embedded-chain-within-blockwise-Gibbs sampler was substan- 
tially slower, up to an hour per neuron, with the Gibbs sampler dominating 
the computation time, because we thinned the chain by a factor of five, 
following preliminary quantification of the autocorrelation timescale of the 
Gibbs chain (data not shown). 
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Algorithm 1 Pseudocode for estimating connectivity from calcium imaging 
data using EM; r]i and 772 are user-defined convergence tolerance parameters. 

while \w^^^ — w('~-^)| > rji do 
for alH = 1 . . . iV do 

while \0? -Of'^^l >r]2 do 

Approximate P[Xi{t)\Fi;ei] using SMC (Section 2.3) 
Perform the M-step for the intrinsic parameters 9i (Section 2.3) 
end while 
end for 
for alH = 1 . . . A^ do 

Approximate P[nj(i),hj(i)|F;^j] using either the blockwise Gibbs 
method or the factorized approximation (Section 2.4) 
end for 
for alH = 1 . . . Af do 

Perform the M-step for {&.j,w,j}j<7v using separable convex opti- 
mization 

methods (Section 2.5) 
end for 
end while 

2.7. Simulating a neural population. To test the described method for 
inferring connectivity from calcium imaging data, we simulated networks of 
spontaneously firing randomly connected neurons according to our model, 
equations (l)-(5), and also using other network models (see Section 3.4). 
Although simulations ran at 1 msec time discretization, the imaging rate 
was assumed to be much slower: 5-200 Hz (cf. Figure 8 below). 

Model parameters were chosen based on experimental data available in 
the literature for cortical neural networks [Sayer, Friedlande and Redman 
(1990); Braitenberg and Schuz (1998); Gomez-Urquijo et al. (2000); Lefort 
et al. (2009)]. More specifically, the network consisted of 80% excitatory 
and 20% inhibitory neurons [Braitenberg and Schuz (1998); Gomez-Urquijo 
et al. (2000)], each respecting Dale's law (as discussed in Section 2.5 above). 
Neurons were randomly connected to each other in a spatially homogeneous 
manner with probability 0.1 [Braitenberg and Schuz (1998); Lefort et al. 
(2009)]. Synaptic weights for excitatory connections, as defined by excitatory 
postsynaptic potential (PSP) peak amplitude, were randomly drawn from an 
exponential distribution with the mean of 0.5 mV [Lefort et al. (2009); Sayer, 
Friedlande and Redman (1990)]. Inhibitory connections were also drawn 
from an exponential distribution, their strengths chosen so as to balance 
excitatory and inhibitory currents in the network, and achieve an average 
firing rate of ~5 Hz [Abeles (1991)]. Practically, this meant that the mean 
strength of inhibitory connections was about 10 times larger than that of 



INFERRING NEURONAL CONNECTIVITY FROM CALCIUM IMAGING 17 

the excitatory connections. PSP shapes were modeled as an alpha function 
[Koch (1999)]: roughly, the difference of two exponentials, corresponding to 
a sharp rise and relatively slow decay [Sayer, Priedlande and Redman (1990)] . 
We neglected conduction delays, given that the time delays below ~1 msec 
expected in the local cortical circuit were far below the time resolution of 
our simulated imaging data. 

Note that PSP peak amplitudes measured in vitro [as in, e.g.. Song et al. 
(2005)] cannot be incorporated directly in equation (1), since the synap- 
tic weights in our model — Wij in equation (1) — are dimensionless quantities 
representing the change in the spiking probability of neuron i given a spike 
in neuron j, whereas PSP peak amplitude describes the physiologically mea- 
sured change in the membrane voltage of a neuron due to synaptic currents 
triggered by a spike in neuron j. To relate the two, note that in order to 
trigger an immediate spike in a neuron that typically has its membrane volt- 
age Vfe ™V below the spiking threshold, roughly ue = Vb/VE simultaneous 
excitatory PSPs with the peak amplitude Ve would be necessary. Therefore, 
the change in the spiking probability of a neuron due to excitatory synaptic 
current Ve can be approximately defined as 

(20) 6Pe = VE/Vt 

(so that SPetle ~ 1). V^ ~ 15 mV here, while values for the PSP amplitude 
Ve were chosen as described above. Similarly, according to equation (1), the 
same change in the spiking probability of a neuron i following the spike of 
a neuron j in the GLM is roughly 

(21) 6PE = [f{h + W,j)-f{h)]Th, 

where recall Th is the typical PSP time-scale, that is, the time over which 
a spike in neuron j significantly affects the firing probability of the neuron i. 
Equating these two expressions gives us a simple method for converting the 
physiological parameters Ve and 14 into suitable GLM parameters wij. 

Finally, parameters for the internal calcium dynamics and fiuorescence 
observations were chosen according to our experience with several cells an- 
alyzed using the algorithm of [Vogelstein et al. (2009)], and conformed to 
previously published results [Yuste et al. (2006); Helmchen, Imoto and Sak- 
mann (1996); Brenowitz and Regehr (2007)]. Table 1 summarizes the details 
for each of the parameters in our model. 

3. Results. In this section we study the performance of our proposed net- 
work estimation methods, using the simulated data described in Section 2.7 
above. Specifically, we estimated the connectivity matrix using both the 
embedded-chain-within-blockwise-Gibbs approach and the simpler factor- 
ized approximation. Figure 4 summarizes one typical experiment: the EM 
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Table 1 
Table of simulation parameters 



Variable 



Value/distribution 



Unit 



Total neurons 
Excitatory neurons 
Connections sparseness 
Baseline firing rate 

Excitatory PSP peak height 
Inhibitory PSP peak height 
Excitatory PSP rise time 
Inhibitory PSP rise time 
Excitatory PSP decay time 
Inhibitory PSP decay time 
Refractory time, wu 

Calcium std. ac 
Calcium jump after spike, Ac 
Calcium baseline, C(, 
Calcium decay time, Tc 
Dissociation constant, Kd 

Fluorescence scale, a 
Fluorescence baseline, /3 
Signal-dependent noise, 7 
Signal-independent noise, a^ 



10-500 
80 
10 
5 

~£:(0.5) 
~-£:(2.3) 

1 

1 

-A/'o.5(10,2.5) 

~Aro.5(20,5) 

-AAo.5(10,2.5) 

-Aro.4(28,10) 
-Aro.4(80,20) 
~A/'o.4(24,8) 
^7Vo.4(200,60) 
200 



10 
10 



-10"'' 
-4-10- 



# 
% 
% 
Hz 

mV 
mV 
msec 
msec 
msec 
msec 
msec 

pM 
liM 
liM 
msec 
pM 

n/a 
n/a 
n/a 
n/a 



£{\) indicates an exponential distribution with mean A, and J\fp{fi,a^) indicates a normal 
distribution with mean jj. and variance a^, truncated at lower bound pfj,. Units (when 
applicable) are given with respect to mean values (i.e., units are squared for variance). 



algorithm using the factorized approximation estimated the connectivity ma- 
trix about as accurately as the full embedded-chain-within-blockwise-Gibbs 
approach (r^ = 0.47 vs. r^ = 0.48). Thus, in the following we will focus pri- 
marily on the factorized approximation, since this is much faster than the 
full blockwise-Gibbs approach (recall Section 2.6). 



3.1. Impact of coarse time discretization of calcium, imaging data and 
scale factor of inferred connection weights. A notable feature of the results 
illustrated in the left panel of Figure 4 is that our estimator is biased down- 
ward by a roughly constant scale factor: our estimates Wij are approximately 
linearly related to the true values of Wij in the simulated network, but the 
slope of this linear relationship is less than one. At first blush, this bias does 
not seem like a major problem: as we discussed in Section 2.7, even in the 
noiseless case we should at best expect our estimated coupling weights Wij 
to correspond to some monotonically increasing function of the true neural 
connectivities, as measured by biophysical quantities such as the peak PSP 
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Fig. 4. Quality of the connectivity matrix estimated from simulated calcium imaging 
data. Inferred connection weights Wij are shown in a scatter plot versus real connec- 
tion weights Wij , with inference performed using the factorized approximation, exact em- 
bedded-chain-within-blockwise-Gibbs approach, and true spike trains down-sampled to the 
frame rate of the calcium imaging. A network of N = 25 neurons was used, firing atfuS Hz, 
and imaged for T =1Q mm at 60 Hz with intermediate eSNR « 6 [see equation (6) and 
Figure 9 below]. The squared correlation coefficient between the connection weights calcu- 
lated using the factorized approximation and true connection weights was r = 0.47, com- 
pared with the embedded- cham-within-blockwise-Gibbs method's r^ = 0.48. For connection 
weights calculated directly from the true spike train down-sampled to the calcium imaging 
frame rate, we obtained r^ = 0.57. (For comparison, r^ = 0.71 for the connectivity matrix 
calculated using the full spike trains with 1 ms precision; data not shown.) Here and in the 
following figures the gray dashed line indicates unity, y = x. The inferred connectivity in 
the left panel shows a clear scale bias, which can be corrected by dividing by the scale cor- 
rection factor calculated in Section 3.1 below (right panel). The vertical lines apparent at 
zero in both subplots are due to the fact that the connection probability in the true network 
was significantly less than one: that is, many of the true weights Wij are exactly zero. 



amplitude. Nonetheless, we would like to understand the source of this bias 
more quantitatively; in this section we discuss this issue in more depth and 
derive a simple method for correcting the bias. 

The bias is largely due to the fact that we suffer a loss of temporal res- 
olution when we attempt to infer spike times from slowly-sampled fluores- 
cence data. As discussed in [Vogelstein et al. (2009)], we can recover some 
of this temporal information by using a finer time resolution for our recov- 
ered spike trains than A, the time resolution of the observed fluorescence 
signal. However, when we attempted to infer w directly spike trains sampled 
from the posterior P[X|F] at higher-than-A resolution, we found that the 
inferred connectivity matrix was strongly biased toward the symmetrized 
matrix (w + w )/2 (data not shown). In other words, whenever a nearly 
synchronous jump was consistently observed in two fluorescent traces Fi{t) 
and Fj[t) (at the reduced time resolution A), the EM algorithm would typ- 
ically infer an excitatory bidirectional connection: that is, both Wij and Wji 
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would be large, even if only a unidirectional connection existed between neu- 
rons i and j in the true network. While we expect, by standard arguments, 
that the Monte Carlo EM estimator constructed here should be consistent 
(i.e., we should recover the correct w in the limit of large data length T 
and many Monte Carlo samples), we found that this bias persisted given 
experimentally-reasonable lengths of data and computation time. 

Therefore, to circumvent this problem, we simply used the original imag- 
ing time resolution A for the inferred spike trains: note that, due to the 
definition of the spike history terms hij in equation (3), a spike in neuron j 
at time t will only affect neuron i's firing rate at time i + A and greater. 
This successfully counteracted the symmetrization problem (and also sped 
the calculations substantially), but resulted in the scale bias exhibited in 
Figure 4, since any spikes that fall into the same time bin are treated as 
coincidental: only spikes that precede spikes in a neighboring neuron by at 
least one time step will directly affect the estimates of Wij, and therefore 
grouping asynchronous spikes within a single time bin A results in a loss of 
information. 

To estimate the magnitude of this time-discretization bias more quantita- 
tively, we consider a significantly simplified case of two neurons coupled with 
a small weight Wi2, and firing with baseline firing rate of r = /(6). In this 
case an approximate sufficient statistic for estimating wi2 may be defined 
as the expected elevation in the spike rate of neuron one on an interval of 
length T, following a spike in neuron two: 

r /■*'+'^ 

SS = E ni{t)dt\7i2{t') = l,n2it)=0yte{t',t' + T] 

(22) 

^rT+ f'{b)wi2Th, 

where f'{b) represents the slope of the nonlinear function /(•) at the base- 
line level b. This approximation leads to a conceptually simple method-of- 
moments estimator. 



(23) 



wi2 = iSS-rT)/f'{b)Th. 



Now, if the spike trains are down-sampled into time-bins of size A, we must 
estimate the statistic SS with a discrete sum instead: 



SS' 



ds 



E 



■f+A+r 

, t=t'+A 



n. 



t{t') 



l,nf(i) = 0VtG(t',i' + 7l 



(24) 



-■A J./ /.A+r 



r^ dv r^+' 
'■r+f'{b)J —J wueM-{t-t')/Th)dt 

l-exp(-A/r/,) 



■rT+f'{b)wi2- 



Nrl 
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Fig. 5. The low frame rate of calcium imaging explains the scale error observed in the 
inferred connectivity weights shown in Figure 4. A correction scale factor may he calculated 
analytically (thick line) as discussed in the main text [equation (25)]. The scale error 
observed empirically (thin line) matches well with this theoretical estimate. In the latter 
case, the scale error was calculated from the fits obtained directly from the true spike 
trains, down sampled to different A, for a network of N — 25 neurons firing at ~5 Hz and 
observed for T — 10 min. The error-bars indicate 95% confidence intervals for scale error 
at each A. 



n 



ds 



[t) here are down-sampled spikes, that is, the spikes defined on a grid 

t = 0, A, 2A, In the second equahty we made the approximation that the 

true position of the spike of the second neuron, 712^ [t'), may be uniformly 
distributed in the first time-bin [0, A], and the discrete sum over t is from 
the second time-bin [A,2A] to \T,'T + A], that is, over all spikes of the 
first neuron that occurred in any of the strictly subsequent time-bins up to 
7" + A. Forming a method-of-moments estimator as in equation (23) leads 
to a biased estimate, 



(25) 



^^ds _ l-exp(-A/rfe) 



Nrh 



and somewhat surprisingly (given the rather crude nature of these approx- 
imations), this corresponds quite well with the scale bias we observe in 
practice. In Figure 5 we plot the scale bias from equation (25) versus that 
empirically deduced from our simulations for different values of A; we see 
that equation (25) describes the observed scale bias fairly well. Thus, we 
can divide by this analytically-derived factor to effectively correct the bias 
of our estimates, as shown in the right panel of Figure 4. 



3.2. Impact of prior information on the inference. Next we investigated 
the importance of incorporating prior information in our estimates. We 
found that imposing a sparse prior (as described in Section 2.5) significantly 
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Fig. 6. Imposing a sparse prior on connectivity improves our estimates. Scatter plots 
indicate the connection weights Wij reconstructed using no prior (r"^ = 0.64; left panel) 
and a sparse prior (r^ — 0.85; right panel) vs. the true connection weights in each case. 
These plots were based on a simulation of N = 50 neurons firing at ~5 Hz, imaged for 
T = 10 min at 60 Hz, with eSNR ~ 10. Clearly, the sparse prior reduces the relative error, 
as indicated by comparing the relative distance between the data points (black dots) to the 
best linear fit (gray dash-dotted line), at the expense of some additional soft-threshold bias, 
as is usual in the Li setting. 

improved our results. For example, Figure 6 illustrates a case in which our 
obtained r^ increased from 0.64 (with no Li penalization in the M-step) to 
0.85 (with penalization; the penalty A was chosen approximately as the in- 
verse mean absolute value of Wij , which is known here because we prepared 
the network simulations, but is available in practice given the previous phys- 
iological measurements discussed in Section 2.7). See also Figure 10 below. 
Furthermore, the weights estimated using the sparse prior more reliably pro- 
vide the sign (i.e., excitatory or inhibitory) of each presynaptic neuron in 
the network (Figure 7). 

Incorporation of Dale's law, on the other hand, only leads to an ~10% 
change in the estimation r^ in the absence of an Li penalty, and no significant 
improvement at all in the presence of an Li penalty (data not shown). Thus, 
Dale's prior was not pursued further here. 

3.3. Impact of experimental factors on estimator accuracy. Next we 
sought to quantify the minimal experimental conditions necessary for ac- 
curate estimation of the connectivity matrix. Figure 8 shows the quality of 
the inferred connectivity matrix as a function of the imaging frame rate, and 
indicates that imaging frame rates > 30 Hz are needed to achieve meaningful 
reconstruction results. This matches nicely with currently-available technol- 
ogy; as discussed in the introduction, 30 or 60 Hz imaging is already in 
progress in a number of laboratories [Nguyen et al. (2001); Iyer, Hoogland 
and Saggau (2006); Salome et al. (2006); Reddy et al. (2008)], though in 
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Fig. 7. The distributions of inferred connection weights using no prior (left panel) 
and a sparse prior (right panel) vs. true distributions. When the sparse prior is en- 
forced, zero weights are recovered with substantially higher frequency (black lines), thus 
allowing better identification of connected neural pairs. Likewise, excitatory and in- 
hibitory weights are better recognized (red and blue lines, respectively) , thus allowing 
accurate classification of neurons as excitatory or inhibitory. The normalized Ham- 
ming distance between the inferred and true connectivity matrix here (defined as 
H{'w,'w) = [A^(A'" — l)]~^'}2i- |sign(TOij) — sign(TOij)|, with the convention sign(O) — 0) 
was 0.06. Distributions are shown for a simulated population of N — 200 neurons fir- 
ing at ~5 Hz and imaged for T = 10 min at 60 Hz, with eSNR ^10. Note that the peak at 
zero in the true distributions (black dashed trace) corresponds to the vertical line visible at 
zero in Figures 4 and 6; inferred distributions were rescaled to remove bias, as described 
in Section 3.1. 



some cases higher imaging rates come at a cost in the signal-to-noise ratio 
of the images or in the number of neurons that may be imaged simultane- 
ously. Similarly, Figure 9 illustrates the quality of the inferred connectivity 
matrix as a function of the effective SNR measure defined in equation (6). 

Finally, Figure 10 shows the quality of the inferred connectivity matrix 
as a function of the experimental duration. The minimal amount of data 
for a particular r^ depended substantially on whether the sparse prior was 
enforced. In particular, when not imposing a sparse prior, the calcium imag- 
ing duration necessary to achieve r^ = 0.5 for the reconstructed connec- 
tivity matrix in this setting was T ~ 10 min, and r^ = 0.75 was achieved at 
T ~ 30 min. With a sparse prior, r^ > 0.7 was achieved already at T ~ 5 min. 
Furthermore, we observed that the accuracy of the reconstruction did not de- 
teriorate dramatically with the size of the imaged neural population: roughly 
the same reconstruction quality was observed (given a fixed length of data) 
for A^ varying between 50-200 neurons. These results were consistent with 
a rough Fisher information computation which we performed but have omit- 
ted here to conserve space. 
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Fig. 8. Accuracy of the inferred connectivity as a function of the frame rate of calcium 
imaging. A population of N = 25 neurons firing at ~5 Hz and imaged for T = 10 min was 



sim,ulated here, 
with A — >• 0. 



with eSNR ~ 10. At 100 Hz, r saturated at the level r « 0.7 achieved 



0.8 



0.6 



'S- 



0.4 



0.2 



—•—66Hz 

-•-33Hz 

■ •■ 15Hz 






- 


9 






J9 " 




..---. 




i *i i • "i ' i 1 





3456789 10 11 
Effective signal to noise ratio (unitless) 



12 



Fig. 9. Accuracy of inferred connectivity as a function of effective imaging SNR [eSNR, 
defined in equation (6)/, for frame rates of 15, 33 and 66 Hz. Neural population simulation 
was the same as in Figure 8. Vertical black lines correspond to the eSNR values of the two 
example traces in Figure 3, for comparison. 



3.4. Impact of strong correlations and deviations from generative model on 
the inference. Estimation of network connectivity is fundamentally rooted 
in observing changes in the spike rate conditioned on the state of the other 
neurons. Considered from the point of view of estimating a standard GLM, 
it is clear that the inputs to our model (1) must satisfy certain basic identi- 
fiability conditions if we are to have any hope of accurately estimating the 
parameter w. In particular, we must rule out highly multicollinear inputs 
{hij{t)}: speaking roughly, the set of observed spike trains should be rich 
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Fig. 10. Accuracy of inferred connectivity as a function of the imaging time and neu- 
ral population size. Incorporating a sparse prior dramatically increases the reconstruction 
quality (dashed lines). When the sparse prior is imposed, T — 5 min is sufficient to re- 
cover 70% of the variance in the connection weights. Incorporating Dale 's prior leads to 
only marginal improvement (dotted line). Furthermore, reconstruction accuracy does not 
strongly depend on the neural population size, N. Here, neural populations of size N = 100 
and 200 are shown (black and gray, respectively), with eSNR ~ 10 and 60 Hz imaging rate 
in each case. 

enough to span all N dimensions of Wj, for each cell i. In the simulations 
pursued here, the coupling matrix {wij}i^j was fairly weak and neurons 
fired largely independently of each other: see Figure 11, upper left, for an 
illustration. In this case of weakly-correlated firing, the inputs {hij{t)} will 
also be weakly correlated, and the model should be identifiable, as indeed we 
found. Should this weak-coupling condition be violated, however (e.g., due 
to high correlations in the spiking of a few neurons), we may require much 
more data to obtain accurate estimates due to multicollinearity problems. 

To explore this issue, we carried out a simulation of a hypothetical strongly 
coupled neural network, where, in addition to the physiologically-relevant 
weak sparse connectivity discussed in Section 2.7, we introduced a sparse 
random strong connectivity component. More specifically, we allowed a frac- 
tion of neurons to couple strongly to the other neurons, making these "com- 
mand" neurons which in turn could strongly drive the activity of the rest of 
the population [MacLean et al. (2005)]. The strength of this strong connec- 
tivity component was chosen to dynamically build up the actual firing rate 
from the baseline rate of f(b) ~ 1 Hz to approximately 5 Hz. Such a network 
showed patterns of activity very different from the weakly coupled networks 
inspected above (Figure 11, top right). In particular, a large number of 
highly correlated events across many neurons were evident in this network. 
As expected, our algorithm was not able to identify the true connectivity 
matrix correctly in this scenario (Figure 11, bottom right panel). For ease of 
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Fig. 11. Diversity of observed neural activity patterns is required for accurate circuit in- 
ference. Here, 15 sec of simulated spike trains for a weakly coupled network (top left panel) 
and a network with strongly coupled component (top right panel) are shown. In weakly cou- 
pled networks, spikes are sufficiently uncorrelated to give access to enough different neural 
activity patterns to estimate the weights "w. In a strongly coupled case, many highly syn- 
chronous events are evident (top right panel), thus preventing observation of a sufficiently 
rich ensemble of activity patterns. Accordingly, the connectivity estimates for the strongly 
coupled neural network (bottom right panel) do not represent the true connectivity of the 
circuit, even for the weakly coupled component. This is contrary to the weakly- coupled 
network (bottom left panel) where true connectivity is successfully obtained. Networks of 
N — 50 neurons firing at ^5 Hz and imaged for T = 10 min at 60 Hz were used to produce 
this figure; eSNR ~ 10. 



comparison, the left panels show a "typical" network (i.e., one lacking many 
strongly coupled neurons), and its associated connectivity inference. 

On the other hand, our inference algorithm showed significant robustness 
to model misspecifcation, that is, deviations from our generative model. One 
important such deviation is variation in the time scales of PSPs in different 
synapses. Up to now, all PSP time-scales were assumed to be the same, that 
iS) {'''ij}i,j<N = Th- In Figure 12 we introduce additional variability in Th 
from one neuron to another. Variability in Th results in added variance in 
the estimates of the connectivity weights, Wij, through the T/j-dependence 
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Fig. 12. Inference is robust to deviations of the data from our generative model. With 
up to 25% variability allowed in PSP time scales t^ (right panel), our algorithm provided 
reconstructions of almost the same quality as when all th 's were the same (left panel). 
Simulation conditions were the same as in Figure 8, at 60 Hz imaging rate. 



of the scaling factor equation (25). However, we found that this additional 
variance was relatively insignificant in cases where r^ varied up to 25% from 
neuron to neuron. We also found that inference was robust to changes in 
the sparseness of the underlying connectivity matrix: we simulated neural 
populations of size N = 25 and A^ = 50 neurons, as above, with connec- 
tion sparseness varying from 5% (very sparse) to 100% (all-to-all), and in 
all cases the performance of our algorithm remained stable, with r^ ~ 0.9 
for the estimate of the connected weights, Wij ^ (data not shown). Fi- 
nally, simulations with more biophysically-based conductance-driven noisy 
integrate-and-fire network models [Vogels and Abbott (2005)] led to qualita- 
tively similar results, further establishing the robustness of these methods; 
again, details are omitted to conserve space. 

4. Discussion. In this paper we develop a Bayesian approach for infer- 
ring connectivity in a network of spiking neurons observed using calcium 
fluorescent imaging. A number of previous authors have addressed the prob- 
lem of inferring neuronal connectivity given a fully-observed set of spike 
trains in a network [Brillinger (1988); Chornoboy, Schramm and Karr (1988); 
Brillinger (1992); Paninski et al. (2004); Paninski (2004); Truccolo et al. 
(2005); Rigat, de Gunst and van Pelt (2006); Nykamp (2007); Kulkarni and 
Paninski (2007); Vidne et al. (2009); Stevenson et al. (2009); Garofalo et al. 
(2009); Cocco, Leibler and Monasson (2009)], but the main challenge in the 
present work is the indirect nature of the calcium imaging data, which pro- 
vides only noisy, low-pass filtered, temporally sub-sampled observations of 
spikes of individual neurons. To solve this problem, we develop a special- 
ized blockwise-Gibbs sampler that makes use of an embedded Markov chain 
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method due to [Neal, Beal and Roweis (2003)]. The connectivity matrix is 
then inferred in an EM framework; the M-step parallehzes quite efficiently 
and ahows for the easy incorporation of prior sparseness information, which 
significantly reduces data requirements in this context. We have found that 
these methods can effectively infer the connectivity in simulated neuronal 
networks, given reasonable lengths of data, computation time, and assump- 
tions on the biophysical network parameters. 

To our knowledge, we are the first to address this problem using the sta- 
tistical deconvolution methods and EM formulation described here [though 
see also Roxin, Hakim and Brunei (2008), who fit simplified, low temporal 
resolution transition-based models to the 10 Hz calcium data obtained by 
Ikegaya et al. (2004)]. However, we should note that [Rigat, de Gunst and 
van Pelt (2006)] developed a closely related approach to infer connectivity 
from low-SNR electrical recordings involving possibly-misclassified spikes (in 
contrast to the slow, lowpass- filtered calcium signals we discuss here) . In par- 
ticular, these authors employed a very similar Bernoulli GLM and developed 
a Metropolis- within- Gibbs sampler to approximate the necessary sufficient 
statistics for their model. In addition [Rigat, de Gunst and van Pelt (2006)] 
develop a more intricate hierarchical prior for the connectivity parameter 
■w; while we found that a simple Li penalization was quite effective here, it 
will be worthwhile to explore more informative priors in future work. 

A number of possible improvements of our method are available. One of 
the biggest challenges for inferring neural connectivity from functional data 
is the presence of indirect inputs from unobserved neurons [Nykamp (2005); 
Nykamp (2007); Kulkarni and Paninski (2007); Vidne et al. (2009); Vakorin, 
Krakovska and Mcintosh (2009)]: it is typically impossible to observe the 
activity of all neurons in a given circuit, and correlations in the unobserved 
inputs can mimic connections among different observed neurons. Developing 
methods to cope with such unobserved common inputs is currently an area 
of active research, and should certainly be incorporated in the methods we 
have developed here. 

Several other important directions for future work are worth noting. First, 
recently-developed photo-stimulation methods for activating or deactivating 
individual neurons or sub-populations [Boyden et al. (2005); Szobota et al. 
(2007); Nikolenko et al. (2011)] may be useful to increase statistical power 
in cases where the circuit's unperturbed activity may not allow reliable de- 
termination of a circuit's connectivity matrix; in particular, by utilizing 
external stimulation, we can in principle choose a sufficiently rich experi- 
mental design (i.e., a sample of input activity patterns) to overcome the 
multicollinearity problems discussed in the context of Figure 11. 

Second, improvements of the algorithms for faster implementation are 
under development. Specifically, fast nonnegative optimization-based decon- 
volution methods may be a promising alternative [Vogelstein et al. (2008); 
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Paninski et al. (2009)] to the SMC approach used here. In addition, modifi- 
cations of our generative model to incorporate nonstationarities in the fluo- 
rescent signal (e.g., due to dye bleaching and drift) are fairly straightforward. 

Third, a fully Bayesian algorithm for estimating the posterior distribu- 
tions of all the parameters (instead of just the MAP estimate) would be of 
significant interest. Such a fully-Bayesian extension is conceptually simple: 
we just need to extend our Gibbs sampler to additionally sample from the pa- 
rameter 9 given the sampled spike trains X. Since we already have a method 
for drawing X given 9 and F, with such an additional sampler we may ob- 
tain samples from P(X.,9\F) simply by sampling from X ~ P(X.\9,F) and 
0~P(^|X), via blockwise-Gibbs. Sampling from the posteriors P(0|X) in 
the GLM setting is quite tractable using hybrid Monte Carlo methods, since 
all of the necessary posteriors are log-concave [Ishwaran (1999); Gamerman 
(1997); Gamerman (1998); Ahmadian, Pillow and Paninski (2011)]. 

Finally, most importantly, we are currently applying these algorithms in 
preliminary experiments on real data. Checking the accuracy of our esti- 
mates is of course more challenging in the context of nonsimulated data, 
but a number of methods for partial validation are available, including 
multiple-patch recordings [Song et al. (2005)], photo stimulation techniques 
[Nikolenko, Poskanzer and Yuste (2007)], and fluorescent anatomical mark- 
ers which can distinguish between different cell types [Meyer et al. (2002)] 
(i.e., inhibitory vs. excitatory cells; cf. Figure 7). We hope to present our 
results in the near future. 
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